mla#1280
Open
feifei14119 wants to merge 6 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates ATOM’s MLA attention stack to support/use segmented MLA KV-cache kernels and a configurable MLA page size, and propagates the new fused “_seg” kernel entrypoints into vLLM/SGLang plugin integrations.
Changes:
- Add
ATOM_MLA_PAGE_SIZEenv var and use it to configure MLA metadata builder block/page sizing. - Switch multiple call sites to segmented fused MLA cache-update kernels (
*_mla_seg) and add segmented-layout handling/validation inMLAAttention. - Adjust MLA decode/prefill paths to pass/use the actual KV cache page size (instead of implicitly assuming 1).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
atom/utils/envs.py |
Adds ATOM_MLA_PAGE_SIZE env var for configuring MLA page/block sizing. |
atom/model_ops/attentions/aiter_mla.py |
Uses ATOM_MLA_PAGE_SIZE to set the metadata builder’s block_size. |
atom/model_ops/attention_mla.py |
Implements segmented KV-cache layout support, adds validation, adjusts page-size handling, and updates kernel call paths. |
atom/plugin/vllm/attention/layer_mla.py |
Updates vLLM plugin calls to use segmented fused MLA cache-update kernel. |
atom/plugin/sglang/models/deepseek_mla_attention.py |
Updates SGLang plugin call to use segmented fused MLA cache-update kernel wrapper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| os.getenv("ATOM_USE_TRITON_MLA_SHUFFLE_KV", "0") == "1" | ||
| ), | ||
| "ATOM_USE_TRITON_MOE": lambda: os.getenv("ATOM_USE_TRITON_MOE", "0") == "1", | ||
| "ATOM_MLA_PAGE_SIZE": lambda: int(os.getenv("ATOM_MLA_PAGE_SIZE", "1")), |
Comment on lines
842
to
846
| # DEBUG(seg): zero-init instead of empty so any region the decode asm | ||
| # does not write shows up as 0 rather than garbage (isolates | ||
| # uninitialized-read bugs in the seg pass). | ||
| o = torch.zeros( | ||
| B, |
Comment on lines
942
to
946
| # DEBUG(seg): zero-init instead of empty so any region the decode asm | ||
| # does not write shows up as 0 rather than garbage (isolates | ||
| # uninitialized-read bugs in the seg pass). | ||
| o = torch.zeros( | ||
| B, |
| # ids at block granularity, so PAGE_SIZE must be the real KV cache | ||
| # block size for the kernel's page// and intra-page% addressing. | ||
| page_size = get_current_atom_config().kv_cache_block_size | ||
| logger.info("triton_mla decode: page_size=%d", page_size) |
Comment on lines
+1201
to
+1209
| q_out = torch.zeros( | ||
| ( | ||
| q_nope.shape[0], | ||
| self.num_heads, | ||
| _MLA_Q_OUT_PADDED_DIM, | ||
| ), | ||
| dtype=attn_metadata.dtype_q, | ||
| device=q_nope.device, | ||
| ) |
76d4fb7 to
ff20bd8
Compare
98b0cf0 to
4c4a325
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist